1. Introduction

1.1 Overview

Novel Corona virus is a highly infectious disease, which has led to a pandemic situation across the globe, affecting almost all nations of the world, and has caused enormous economic, social and psychological burden on countries. According to the World Health Organization(WHO), millions of enterprises face an existential threat and nearly half of the world’s 3.3 billion global workforce are at risk of losing their livelihoods. Moreover, the economic and social disruption caused by the pandemic is devastating: tens of millions of people are at risk of falling into extreme poverty. While COVID-19 pandemic is impacting the global community in many ways, US also faced dramatic loss of human life due to unnecessarily underestimating the impact of the pandemic.

1.2 Location of US

The United States of America (the U.S.A. or the USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America. It consists of 50 states, a federal district, five major unincorporated territories, 326 Indian reservations, and some minor possessions. At 3.8 million square miles (9.8 million square kilometers), it is the world’s third- or fourth-largest country by total area. The United States shares significant land borders with Canada to the north and Mexico to the south, as well as limited maritime borders with the Bahamas, Cuba, and Russia With a population of more than 331 million people, it is the third most populous country in the world. The national capital is Washington, D.C., and the most populous city is New York City.(Wikipedia, 2021)

The United States of America is the world’s third-largest country in size and nearly the third largest in terms of population. Located in North America, the country is bordered on the west by the Pacific Ocean and to the east by the Atlantic Ocean. Along the northern border is Canada and the southern border is Mexico. There are 50 states and the District of Columbia.

Resource: United States - National Geographic kids

1.3 Climate of US

Being a huge country, the contiguous United States is home to a wide variety of climates. However, in general, it has a continental climate, with cold winters (often frigid) and hot summers (sometimes very hot), with a different season duration depending on latitude and distance from the sea. There are, however, some exceptions: on the west coast overlooking the Pacific Ocean, the climate is cool and damp in the northern part and the Mediterranean in the southern part; on the coast of the Gulf of Mexico, the climate is mild in winter and hot and muggy in summer, while in Florida, it is almost tropical; the mountainous areas are cold in winter and cool to cold even in summer; and finally, there are deserts which are mild in winter and scorchingly hot in summer. Since there are no obstacles to cold air masses from Canada, almost all of the country can experience sudden cold waves in winter, but they have different intensities and duration depending on the area. Cold spells last a few days in the south, where the temperature drops a few degrees below freezing (0 °C or 32 °F) in winter, while they are intense and sometimes long in inland areas, in the highlands and in the north-east. The summer heatwaves can be intense as well, especially in inland areas. In general, the western half of the country is arider than the eastern one, with the exception of the north-central coast of the Pacific, which is rainy.

Climates to travel

1.4 Lockdown status of the country by States.(27/09/2021)

Image source: https://www.usatoday.com Image source: https://www.usatoday.com

1.5 Actions taken by government to control covid

The novel virus was first identified in Wuhan, China, in 2019. A lock down in Wuhan and other cities in Hubei Province failed to contain the outbreak, and it spread to other parts of the China and around the world. The World Heath Organization(WHO) declared a Public Health Emergency of International Concern on 30 January 2020, and a pandemic on 11 March 2020. Since 2021, variants of the virus have emerged and become dominant in many countries, with the Delta, Alpha and Beta variants being the most virulent. As of 27 September 2021, more than 231 million cases and 4.74 million deaths have been confirmed, making one of the out breaking pandemics in history. Most of the countries took immediate action for the pandemic from the beginning while the initial U.S response to the pandemic was not admirable. Therefore COVID-19 is the deadliest pandemic in American history, with over 688,000 deaths.

Now, Let’s look at the actions taken by U.S government to control COVID-19 pandemic with the expansion of the pandemic.

  • The first American case was reported on January 2020 -President Donald Trump declared the U.S outbreak a public health emergency on January 31.

    -Restrictions were placed on flights arriving from China

  • The first known American deaths occurred in February

    -On March 6, 2020, Trump allocated $8.3 billion to fight the outbreak and declared a national emergency on March 13.

    -The government purchased large quantities of medical equipments.

  • By mid April in 2020, disaster declarations were made by all states and territories as they all had increasing cases.

  • A second wave of infections began in June.

  • By mid October 2020, a third surgeof cases began

  • Vaccines become available in December 2020 under emergency use, and one was officially approved by the FDA on August 23,2021.

  • A fourth rise in infections began in late March(2021) with the rise of the Alpha Variant.

Vaccination Program

Recent data from studies in the U.S and in other countries have found that the available COVID-19 vaccines from the United States are highly protective against severe illness, hospitalization, and death due to COVID-19. However, Several vaccines have been approved and distributed in various countries and US also started their vaccination program after a slow kickoff in December. Then the country surpassed President Biden’s initial goals of getting 100 million vaccines into arms in his first 100 days, reaching 200 million vaccines by day 92. Vaccine eligibility opened across the country to everyone 16 and up in the U.S. in mid-April, and expanded to kids as young as 12 in mid-May. By July, the country had made significant progress, but still fell several million people short of President Biden’s goal of getting at least one shot to 70% of adults in the U.S. by Independence Day.

2. Exploratory Data Analysis

The data set used for the analysis is, the open data set 2019 Novel Corona virus COVID-19 (2019-nCoV) data set which is available in R. This data set provide a summary of the Corona virus (COVID-19) cases by state/province. Data source: Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Corona virus.

The available data set contains the reported cases of Covid-19 from 2020-01-22 to the date 2021-09-18. Moreover, the complete coronavirus data set contains 498132 observations of 7 variables, which contains the daily summary of worldwide Corona virus cases. From the complete data set, we extracted the US covid data set for the analyses going forward. The US covid data set consists of 1818 observations of 7 variables. These 7 variables can be explained as;

  1. Date - Format: “YYYY-MM-DD”
  2. Country -chr(data type)
  3. Province/state - chr(data type): Depends upon the availability
  4. Long - num(data type) : Longitude of center of the country/province based on the center of the location
  5. Lat - num(data type) : Latitude of center of the country/province based on the center of the location
  6. type - chr(data type) : confirmed, recovered, death
  7. cases - int(data type) : No of cases on given date

This data set has a daily information on the number of confirmed cases, deaths and recovery.Now let’s start doing the exploratory data analysis of the us covid-19 data from 2020-01-22 to 2021-09-18 using the R programming language.

Since our analysis is based on the US COVID-19 data, we filtered the US COVID -19 data from the corona virus data set.

Now, Let’s look at the summary of the US covid-19 data set to get a brief overview of the available data set.

##       date              province           country               lat    
##  Min.   :2020-01-22   Length:1818        Length:1818        Min.   :40  
##  1st Qu.:2020-06-21   Class :character   Class :character   1st Qu.:40  
##  Median :2020-11-19   Mode  :character   Mode  :character   Median :40  
##  Mean   :2020-11-19                                         Mean   :40  
##  3rd Qu.:2021-04-20                                         3rd Qu.:40  
##  Max.   :2021-09-18                                         Max.   :40  
##                                                                         
##       long          type               cases         
##  Min.   :-100   Length:1818        Min.   :-6298082  
##  1st Qu.:-100   Class :character   1st Qu.:     246  
##  Median :-100   Mode  :character   Median :    2131  
##  Mean   :-100                      Mean   :   24097  
##  3rd Qu.:-100                      3rd Qu.:   32534  
##  Max.   :-100                      Max.   :  302959  
##                                    NA's   :45

By looking at the summary of the data set we can identify that there’s a negative value in the minimum number of cases(-6298082). This gives the incorrect information regarding the no of cases for the analysis.By splitting the US COVID-19 data set based on the column “type”, we obtained three data sets.The summaries we obtained for the three sub data sets are given below.

##       date              province           country               lat    
##  Min.   :2020-01-22   Length:606         Length:606         Min.   :40  
##  1st Qu.:2020-06-21   Class :character   Class :character   1st Qu.:40  
##  Median :2020-11-19   Mode  :character   Mode  :character   Median :40  
##  Mean   :2020-11-19                                         Mean   :40  
##  3rd Qu.:2021-04-19                                         3rd Qu.:40  
##  Max.   :2021-09-18                                         Max.   :40  
##       long          type               cases       
##  Min.   :-100   Length:606         Min.   :     0  
##  1st Qu.:-100   Class :character   1st Qu.: 24265  
##  Median :-100   Mode  :character   Median : 46589  
##  Mean   :-100                      Mean   : 69390  
##  3rd Qu.:-100                      3rd Qu.: 79470  
##  Max.   :-100                      Max.   :302959
##       date              province           country               lat    
##  Min.   :2020-01-22   Length:606         Length:606         Min.   :40  
##  1st Qu.:2020-06-21   Class :character   Class :character   1st Qu.:40  
##  Median :2020-11-19   Mode  :character   Mode  :character   Median :40  
##  Mean   :2020-11-19                                         Mean   :40  
##  3rd Qu.:2021-04-19                                         3rd Qu.:40  
##  Max.   :2021-09-18                                         Max.   :40  
##       long          type               cases       
##  Min.   :-100   Length:606         Min.   :   0.0  
##  1st Qu.:-100   Class :character   1st Qu.: 429.2  
##  Median :-100   Mode  :character   Median : 881.0  
##  Mean   :-100                      Mean   :1111.3  
##  3rd Qu.:-100                      3rd Qu.:1513.8  
##  Max.   :-100                      Max.   :4460.0
##       date              province           country               lat    
##  Min.   :2020-01-22   Length:606         Length:606         Min.   :40  
##  1st Qu.:2020-06-21   Class :character   Class :character   1st Qu.:40  
##  Median :2020-11-19   Mode  :character   Mode  :character   Median :40  
##  Mean   :2020-11-19                                         Mean   :40  
##  3rd Qu.:2021-04-19                                         3rd Qu.:40  
##  Max.   :2021-09-18                                         Max.   :40  
##                                                                         
##       long          type               cases         
##  Min.   :-100   Length:606         Min.   :-6298082  
##  1st Qu.:-100   Class :character   1st Qu.:       0  
##  Median :-100   Mode  :character   Median :       0  
##  Mean   :-100                      Mean   :       0  
##  3rd Qu.:-100                      3rd Qu.:   16564  
##  Max.   :-100                      Max.   :  150267  
##                                    NA's   :45

Therefore, we can see that the no of minimum cases that are reported in the recovered cases data set is similar to the negative value that we obtained previously, which is -6298082.The number of recovered COVID cases can’t be a negative value. Therefore, we can say that they have mistakenly recorded negative values as some of the number of recovered cases in the COVID-19 data set. Moreover, we observed that there are 45 missing(NA) values in the cases column of the US COVID-19 data set, which is similar to the number of missing(NA) values in the recovered cases sub data set that we have obtained by filtering the type(recovered, death, confirmed). It conveys that all the missing values in the COVID-19 cases have been resulted due to the mistakenly entered recovered number of cases. Next we cleaned the data set to remove missing values.

According to the summary of the data we obtained above for the cleaned data set, the mean number of COVID-19 confirmed cases reported per a day is 69390 people and on average 1111 deaths has been reported daily due to COVID-19 in US.

In line with the Figure 1, we can say that even though the first corona virus case in the US was confirmed on 21st January 2020, the cases surged from the second half of February and further in March as the nation-wide testing was increased significantly.

The time series plot in Figure 1 conveys that, there are some patterns in the time series plot due to the unexpected pandemic waves. The U.s has been impacted with four pandemic waves during the period that we have analysed(from 2020-01-22 to 2021-09-18).

We can clearly see the variations in the confirmed cases during the above four periods of the pandemic.The highest number of cOVID-19 daily confirmed cases reported was 302959 infected people. Further more the highest number of daily death count was 4460 deaths, which was recorded on 11th of December 2020 during the third wave of the infection.

Figure 2: Comparison of US , China and India COVID-19 cases (daily)

The no.of cases on any given day is the cumulative number. When analyzing the COVID-19 data of US, it’s important to compare the cases of similar countries with the relatively similar conditions. Here we analyse US COVID-19 cases with India and China COVID-19 cases because these three countries are having the world largest population; where China holds the first place. China, is currently the fourth-worst affected country by the novel corona virus outbreak. China has witnessed more than 4,000 deaths. The number of cases in China has been on decline starting March 2020.

Moreover, it’s equal important to compare the COVID-19 cases of the countries which are located nearby to U.S. Here we compare US COVID-19 data with Canada and Mexico.

3. Conclusions and Discussion

3.1 Discussion

We observed that there are negative values available in the USA covid-19 data set. Further analyzing the data set we were able to find out that these negative values has been entered as result of the mistakenly entered daily recovered cases data. Since the recovered cases data are not randomly missing, a mean substitution can’t be done in place of the mistakenly entered data value for the same variable cases. Since the mean is affected by the presence of the outliers it seems natural to use the median. But the mean and the median value we obtained for the recovered data set was also zero.

According to (Mater’s in Data Science ,2021), removing the missing data may result a bias. Also it’s not ideal to substitute zero to the negative values, as it was a big proportion of the recovered data set. But during the analysis we concerned about the death cases and the confirmed cases, to prevent the misinterpretation of the data occur due to the faults of the data available. We could have done further studies about how to deal with higher number of negative values in the recovered data set which are also not random.

To have an overall idea about the COVID-19 situation in the US, it’s important to compare the situation with the other countries. In this study we have taken crude death rate as a measure to compare the impact of COVID-19 pandemic on different countries. Here we have considered Canada and Mexico due to the nearby location. Furthermore, China, India and US are the countries with the highest number of population in the world.

##   Country Total_Population Death_counts Crude_Death_rate
## 1      US        333434380       709119      212.6712308
## 2   China       1446260522         4636        0.3205508
## 3   India       1397131989       447406       32.0231734
## 4  Mexico        130645026       275676      211.0114778
## 5  Canada         38159048        27689       72.5620828

We can say that Us has the highest Crude death rate which is 212.67 among these 5 countries.

Even though China has witnessed more impact of the COVID-19, at the start. The number of cases in China, has been on the decline starting March 2020 due to the rapid control measures and response by the Chinese government.

It was a popular fact across the globe that India is also among the highly affected countries.There was a time that the scenes and stories coming out of India grow more heart wrenching everyday.

3.2 Conclusions

In line with our findings we can conclude that US is the mostly affected country where COVID-19 has spread. Community spread and the delayed testing has been a major concern to Americans. The United States remains the world’s worst-affected country due to underestimating the spike of the outbreak and not taking necessary actions at the beginning. Furthermore still US has not completely recovered from the Fourth wave of COVID-19 infections, with irregular variations of the confirmed cases.However, the countries has taken necessary preventive measures to minimize the impact of the pandemic. The government of US was not so active during the previous stages of the pandemic; but later US government have taken steps to help mitigate the spread of the virus.

4. References:

United States- Wikipedia (2021) Wikipedia. Available at: Wikipedia(Accessed: 27 Sept 2021).

United States - National Geographic kids (2020)._United States - National Geographic kids_ Available at: https://kids.nationalgeographic.com/geography/countries/article/united-states#:~:text=respectful%20of%20copyright.-,The%20United%20States%20of%20America%20is%20the%20world’s%20third%20largest,the%20southern%20border%20is%20Mexico. (Accessed: 27 Set 2021)

Climates to travel, 2020 Climates to travel (Accesseed: 28 Sept 2021)

World Health Organization(WHO) (2021) Impact of COVID-19 on people’s livelihoods, their health and our food systems.Available at:https://www.who.int/news/item/13-10-2020-impact-of-covid-19-on-people’s-livelihoods-their-health-and-our-food-systems (Accessed: 28 Sept 2021)

Google (2021) Google terms of service. Available at:https://cran.r-project.org/web/packages/coronavirus/index.html(Accessed: 27 Sept 2021).

Woldometer (2021) Countries in the world by population. Available at: https://www.worldometers.info/world-population/population-by-country/ (Accessed: 27 Sept 2021).

Mater’s in Data Science (2021) How to deal with missing data?. Available at: https://www.mastersindatascience.org/learning/how-to-deal-with-missing-data/ (Accessed: 27 Sept 2021)

How Is The COVID-19 Vaccination Campaign Going In Your State? Available at: https://www.npr.org/sections/health-shots/2021/01/28/960901166/how-is-the-covid-19-vaccination-campaign-going-in-your-state (Accessed: 28 Sept 2021)